-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbind error while retrieving results #39
Comments
?? |
Sorry, it's my first post and I blew it while pressing CTRL+RETURN. My bad :) |
@jmecahansen, I haven't dug into your zip, and this may not be the fix, but have you experimented with sys.sleep or other pauses in your loop to ensure you're not hitting API call limits? also have you switched to your own google cloud project or are you using the default scopes/client/secret etc? I've dealt with similar issues, but haven't tried to automate it, as it's not a very frequent need and i've found it faster to just trial and error "by-hand" with larger jumps in the max rows... |
I did it... and no luck. Even if I do some spacing between calls (let's say, 5 seconds), I am bitten by the |
I think I need to see some more code, I don't understand why not set your rowLimit to 99999 or similar so its gets all rows, it should batch it for you. If it doesn't get any results, then it returns a data.frame with NA instead, that may be the source of your trouble. I also tend to use But add some code and I can take a look better. |
Oh ok, code in the zip - in the future could you paste the code in the question? |
The pertinent bit seems to be # configuration: row retrieval limit
config.gsc.rowLimit <- 5000
....
# extract Google Search Console (GSC) data
rowCount <- 0
rowLimit <- config.gsc.rowLimit
repeat {
data <- search_analytics(
siteURL = config.gsc.domain,
startDate = date_range[d],
endDate = date_range[d],
dimensions = s,
rowLimit = rowLimit,
walk_data = "byBatch"
)
if (is.data.frame(data)) {
rowCount <- nrow(data)
}
if (rowCount == 0 || rowCount < rowLimit) {
break
} else {
rowLimit <- rowLimit + config.gsc.rowLimit
Sys.sleep(5)
}
} which I think could be config.gsc.rowLimit <- 9999999L
....
data <- tryCatch(search_analytics(
siteURL = config.gsc.domain,
startDate = date_range[d],
endDate = date_range[d],
dimensions = s,
rowLimit = rowLimit,
walk_data = "byBatch"
), error = function(err){
message("Problem fetching data, returning NULL")
NULL
})
|
That solution isn't right... at least not for my case. I've tried your example and I had to manually stop it. Here's the short log from RStudio:
By default, the date range is Sys.Date() - 3 to match GSC and, in this case, the dataset contains 33224 rows for one of the dimension sets (date, page, country and device in this case). It doesn't seem right to me to try to gather N batches because, with the row limit set at 9999999L and the iterator limit being 5000 IIRC, that means you're gonna process ~2000 calls for gathering just 33224 rows :P |
Maybe a good thing could be to provide a function which makes a first call to the API to return the total number of rows for a given query and then, we could set the limit with that value. The call count would be (N / 5000) + 1, not in the range of 1..2000. |
Ah ok fair enough, problem is the API doesn't return the total number like the Google Analytics API so you can't plan how many you will need, AFAIK |
Hi,
I'm writing an R extractor for both Google Analytics (GA) and Google Search Console (GSC) and I'm running into a problem I don't know how to solve.
As I don't know how many rows will I get for a given day in GSC, what I do is extract the data inside a repeat loop. I get a default number of rows (5000 by default) and, if the number of rows match the row limit, I retry the extraction with the row limit increased (by another 5000 rows in each iteration), so what it does is fetch 5000, 10000, 15000, 20000, ..., N rows.
If a given day has less than 5000 rows, it works well. But if it happens to have more than that, I tend to get the following error message in the console:
Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match
When I'm not getting that, sometimes the extraction seems to be stuck and, after some time, it gives me the following error message in the console:
Error in curl::curl_fetch_memory(url, handle = handle) : Timeout was reached
This happens to me once in a while, but never in an apparently predictable fashion. It either times out after successfully fetching 12 or 13 results or it times out after the first retrieval.
I've provided a sample in case you could lend me a hand. It's fragmented so the -base.R file initializes some project data and the -upload.R file is the one responsible for extraction and upload. The other file just specifies precise data for the customer I want to extract data from.
Can you give it a look?
Thanks in advance,
Julio :)
sample.zip
The text was updated successfully, but these errors were encountered: