Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 6 - Test File #1

Open
safeisrisky opened this issue Aug 8, 2020 · 4 comments
Open

Chapter 6 - Test File #1

safeisrisky opened this issue Aug 8, 2020 · 4 comments

Comments

@safeisrisky
Copy link

Hi,

I think there is a bug in the test file for Chapter 6 - tiny_python_projects/06_wc/test.py/

The following is a test function

def test_more():
    """Test on more than one file"""
    rv, out = getstatusoutput(f'{prg} {fox} {sonnet}')
    expected = ('       1       9      45 ../inputs/fox.txt\n'
                '      17     118     661 ../inputs/sonnet-29.txt\n'
                '      18     127     706 total')
    assert rv == 0
    assert out.rstrip() == expected

I think the above test function should have been

def test_more():
    """Test on more than one file"""
    rv, out = getstatusoutput(f'{prg} {fox} {sonnet}')
    expected = ('       1       9      45 ../inputs/fox.txt\n'
                '      17     118     669 ../inputs/sonnet-29.txt\n'
                '      18     127     714 total')
    assert rv == 0
    assert out.rstrip() == expected
@AlbertUlysses
Copy link

AlbertUlysses commented Jan 29, 2021

Yes, I can also confirm this. I think the method the python script uses disregards some bytes. I double checked using the Linux's wc command
@kyclark
do you have any input on this?

@AlbertUlysses
Copy link

AlbertUlysses commented Jan 29, 2021

I just saw in the book the solution is correct on page 111, when it uses the linux's wc command it has the 714 total, but the test.py in the repo is wrong.
However, when we enter solution.py into the test.py in the repo it passes all the test.
when we compare the two:

>>> import os
>>> len(open('../inputs/sonnet-29.txt').read())
661
>>> os.path.getsize('../inputs/sonnet-29.txt')
669
>>> 

We can see that the len of the file's string isn't the same as a byte size.
Basically the different between the two is explained here:

A. To count number of characters in str object, you can use len() function:

>>> print(len('please anwser my question'))
25

B. To get memory size in bytes allocated to store str object, you can use sys.getsizeof() function

>>> from sys import getsizeof
>>> print(getsizeof('please anwser my question'))
50

source: https://stackoverflow.com/questions/4967580/how-to-get-the-size-of-a-string-in-python

@kyclark
Copy link
Owner

kyclark commented Jan 30, 2021

Well, first off, my apologies for taking so long to address the original bug @safeisrisky. Yes, I can see there is a discrepancy between how I'm counting bytes using the length of a string and the actual size of the string. At this point, I can't fix the book, so I think it will just have to remain "wrong" where here "wrong" means "not the same as wc." It's unfortunate, but the point of the exercise is to get the reader to think about lines, words, and characters. I hate that I missed the distinction between characters and memory! Thanks for pointing this out.

@AlbertUlysses
Copy link

AlbertUlysses commented Feb 4, 2021

Hi!
I wanted to say one last thing, which is mostly for anyone curious about this. There isn't anything "wrong" with the code. The "issue" occurs because Python strings are in latin1, and the text file (sonnet-29.txt) has UTF8 characters (the apostrophes). So when we read the string, it returns a slightly smaller number. The easiest way to "fix" this is to change one variable in the solution code to:

num_bytes += len(line.encode('UTF8'))

With this change, there should be a change in the test.py

Also, thank you @kyclark for all the work you did on this book. I'm really enjoying it.

ZakiRucker pushed a commit to ZakiRucker/tiny_python_projects that referenced this issue Sep 12, 2021
NowHappy pushed a commit to NowHappy/tiny_python_projects that referenced this issue Oct 1, 2021
NowHappy pushed a commit to NowHappy/tiny_python_projects that referenced this issue Oct 1, 2021
Yarpirates20 added a commit to Yarpirates20/tiny_python_projects that referenced this issue Dec 23, 2022
Fixes kyclark#1
Use map function when assigning low, high in read_csv to split the string at the '-' character, and convert the strings to integers using int() function, the first map argument.

Fixes kyclark#2
Initiate empty list as wod. Use a foor loop of random.sample(exercises, ...)  which creates a list of tuples. Assign 3 variables for each iteration of the loop: exercise, low, and high.

Pick a random number between low and high, and append the tuple to wod. If the easy flag is given as argument, divide reps by two. Use the int() function to convert the result to an integer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants