Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying one day of data at a time only gives 5000 rows #48

Open
jceallonardo opened this issue Dec 18, 2018 · 5 comments
Open

Querying one day of data at a time only gives 5000 rows #48

jceallonardo opened this issue Dec 18, 2018 · 5 comments

Comments

@jceallonardo
Copy link

What goes wrong

When running search_analytics on 1 day, row_limit appears to cap out at 5,000 rows.

I know an issue regarding 5000 rows was created a few years ago, but this might be a different problem since Google recently upped the max rowLimit to 25,000.

Steps to reproduce the problem

searchConsoleR version 0.3.0.9000
googleAuthR version 0.7.0.9000

uri <- "https://www.mydomain.com/"
start <- Sys.Date() - 4
end <- Sys.Date() - 4
dims <- c('query')
listwebs <- list_websites()
data <- search_analytics(siteURL = uri,
startDate = start,
endDate = end,
dimensions = dims,
rowLimit = 25000)

Expected output

data.frame with more than 5,000 obs.

Actual output

data.frame with exactly 5,000 obs.

I have tried with multiple domains, and it outputs 5,000 rows every time.

Verbose output:

Fetching search analytics for url: https://www.mydomain.com/ dates: 2018-12-14 2018-12-14 dimensions: query dimensionFilterExp: searchType: web aggregationType: auto
2018-12-18 16:15:05> Token exists.
2018-12-18 16:15:05> Request: https://www.googleapis.com/webmasters/v3/sites/https%3A%2F%2Fwww.mydomain.com%2F/searchAnalytics/query
2018-12-18 16:15:05> Body JSON parsed to: {"startDate":"2018-12-14","endDate":"2018-12-14","dimensions":["query"],"searchType":"web","dimensionFilterGroups":[{"groupType":"and","filters":[]}],"aggregationType":"auto","rowLimit":25000}

Session Info

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] searchConsoleR_0.3.0.9000

loaded via a namespace (and not attached):
[1] rstudioapi_0.8 magrittr_1.5 R6_2.3.0 httr_1.4.0
[5] tools_3.5.1 pkgbuild_1.0.2 cli_1.0.1 googleAuthR_0.7.0.9000
[9] withr_2.1.2 remotes_2.0.2 openssl_1.1 yaml_2.2.0
[13] assertthat_0.2.0 digest_0.6.18 rprojroot_1.3-2 crayon_1.3.4
[17] processx_3.2.1 callr_3.1.0 ps_1.2.1 curl_3.2
[21] memoise_1.1.0 compiler_3.5.1 backports_1.1.3 prettyunits_1.0.2
[25] jsonlite_1.6

@jceallonardo
Copy link
Author

Another important note is that I believe this has implications to batching "byDate", as a similar 5000 row limit is reached per day, even though the package states that 25000 rows are being fetched.

@MarkEdmondson1234
Copy link
Owner

I can't reproduce this, it gets 25000 rows per batch for me when I use byBatch and 25000 per day when I use byDate

my_example <- "http://www.example.co.uk"
sa2 <- search_analytics(my_example, startDate = Sys.Date() - 10, 
                         dimensions = c("date","device", "country" ,"query","page"), 
                         walk_data = "byBatch", rowLimit = 50000)
# 50000 rows
nrow(sa2)


sa3 <- search_analytics(my_example, startDate = Sys.Date() - 5,  endDate = Sys.Date() - 3
                         dimensions = c("date","device", "country" ,"query","page"), 
                         walk_data = "byDate")

# 75000 rows
nrow(sa3)

@jceallonardo
Copy link
Author

I get your outputs when I include all of the dimensions you do, but try running your query again with just the "date" and "query" dimensions.

@MarkEdmondson1234
Copy link
Owner

Yes I see now:

sa2 <- search_analytics(my_example, startDate = Sys.Date() - 5,dimensions = c("date","query"), walk_data = "byDate")
Fetching search analytics for url: https://www.world-first.co.uk/ dates: 2018-12-14 2018-12-16 dimensions: date query dimensionFilterExp:  searchType: web aggregationType: auto
Batching data via method: byDate
Will fetch up to 25000 rows per day
2018-12-19 15:19:14> Request #: 2018-12-14
2018-12-19 15:19:17> Request #: 2018-12-15
2018-12-19 15:19:19> Request #: 2018-12-16

# 15000 rows
nrow(sa2)

Hmm, well there is nothing in the code that does this so I guess its the API itself limiting the results when you just query those dimensions. If thats true a Python call will return similar, perhaps it should be lodged as a bug with the Search Console API team if its verified.

@jceallonardo
Copy link
Author

Yeah. I just ran a test w/ Python and got the same. Weird. I don't recall this being an issue before.

@MarkEdmondson1234 MarkEdmondson1234 changed the title One day of data caps at 5,000 rows Using 'query' and 'date' dimensions only gives 5000 rows Dec 19, 2018
@MarkEdmondson1234 MarkEdmondson1234 changed the title Using 'query' and 'date' dimensions only gives 5000 rows Query one day of data at a time only gives 5000 rows Dec 19, 2018
@MarkEdmondson1234 MarkEdmondson1234 changed the title Query one day of data at a time only gives 5000 rows Querying one day of data at a time only gives 5000 rows Dec 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants