Posts by Tags

Python

Weibo User Profile Spider

1 minute read

Published:

This is a reference of Weibo user profile data spider.

使用python爬取微博用户简历信息

1.变量包括:性别,阳光信用,生日(星座),注册日期,简介,是否认证,认证信息,粉丝数,关注数,微博数,svip等级。

2.基于GitHub用户inspurer的代码补充修改。

3.代码仅供参考,请在法律允许范围内使用。

```python import requests import pandas as pd from time import sleep import time import json

R

R|convert rtf to csv

less than 1 minute read

Published:

```R #install.packages(“readtext”) library(readtext) write_rtf_data_to_csv <- function(input_folder, output_folder, output_file) { # Get the list of RTF files in the input folder rtf_files <- list.files(path = input_folder, pattern = “\.rtf$”, full.names = TRUE)

# Initialize an empty data frame to store the contents of all files all_content <- data.frame(content = character(), stringsAsFactors = FALSE)

# Loop over the RTF files and read in the contents for (i in seq_along(rtf_files)) { file_content <- readtext(rtf_files[i]) content <- strsplit(file_content[[2]], split = “\n\n\n\n\n\n \n”) df_content <- data.frame(content, stringsAsFactors = FALSE) colnames(df_content) <- c(“content”) all_content <- rbind(all_content, df_content) }

# Write the combined data frame to a CSV file write.csv(all_content, file.path(output_folder, output_file), row.names = FALSE) }

Spider

Weibo User Profile Spider

1 minute read

Published:

This is a reference of Weibo user profile data spider.

使用python爬取微博用户简历信息

1.变量包括:性别,阳光信用,生日(星座),注册日期,简介,是否认证,认证信息,粉丝数,关注数,微博数,svip等级。

2.基于GitHub用户inspurer的代码补充修改。

3.代码仅供参考,请在法律允许范围内使用。

```python import requests import pandas as pd from time import sleep import time import json

Weibo

Weibo User Profile Spider

1 minute read

Published:

This is a reference of Weibo user profile data spider.

使用python爬取微博用户简历信息

1.变量包括:性别,阳光信用,生日(星座),注册日期,简介,是否认证,认证信息,粉丝数,关注数,微博数,svip等级。

2.基于GitHub用户inspurer的代码补充修改。

3.代码仅供参考,请在法律允许范围内使用。

```python import requests import pandas as pd from time import sleep import time import json

csv

R|convert rtf to csv

less than 1 minute read

Published:

```R #install.packages(“readtext”) library(readtext) write_rtf_data_to_csv <- function(input_folder, output_folder, output_file) { # Get the list of RTF files in the input folder rtf_files <- list.files(path = input_folder, pattern = “\.rtf$”, full.names = TRUE)

# Initialize an empty data frame to store the contents of all files all_content <- data.frame(content = character(), stringsAsFactors = FALSE)

# Loop over the RTF files and read in the contents for (i in seq_along(rtf_files)) { file_content <- readtext(rtf_files[i]) content <- strsplit(file_content[[2]], split = “\n\n\n\n\n\n \n”) df_content <- data.frame(content, stringsAsFactors = FALSE) colnames(df_content) <- c(“content”) all_content <- rbind(all_content, df_content) }

# Write the combined data frame to a CSV file write.csv(all_content, file.path(output_folder, output_file), row.names = FALSE) }

rtf

R|convert rtf to csv

less than 1 minute read

Published:

```R #install.packages(“readtext”) library(readtext) write_rtf_data_to_csv <- function(input_folder, output_folder, output_file) { # Get the list of RTF files in the input folder rtf_files <- list.files(path = input_folder, pattern = “\.rtf$”, full.names = TRUE)

# Initialize an empty data frame to store the contents of all files all_content <- data.frame(content = character(), stringsAsFactors = FALSE)

# Loop over the RTF files and read in the contents for (i in seq_along(rtf_files)) { file_content <- readtext(rtf_files[i]) content <- strsplit(file_content[[2]], split = “\n\n\n\n\n\n \n”) df_content <- data.frame(content, stringsAsFactors = FALSE) colnames(df_content) <- c(“content”) all_content <- rbind(all_content, df_content) }

# Write the combined data frame to a CSV file write.csv(all_content, file.path(output_folder, output_file), row.names = FALSE) }