Skip to content

Add a cache mechanism#62

Open
femtotrader wants to merge 7 commits into
trickvi:masterfrom
femtotrader:cache
Open

Add a cache mechanism#62
femtotrader wants to merge 7 commits into
trickvi:masterfrom
femtotrader:cache

Conversation

@femtotrader
Copy link
Copy Markdown
Contributor

Should fix #61

@femtotrader
Copy link
Copy Markdown
Contributor Author

Please don't merge.

Some tests are failling.

Here is some sample code

import datapackage
import requests_cache
import datetime
#session = None
session = requests_cache.CachedSession(cache_name='cache', backend='sqlite', expire_after=datetime.timedelta(days=60))
print(session)
#datapkg = datapackage.DataPackage('http://data.okfn.org/data/cpi/')
datapkg = datapackage.DataPackage('http://data.okfn.org/data/cpi/', session=session)

print(datapkg)

import pandas as pd
print(pd.DataFrame(list(datapkg.data)))

I also noticed after submitting this PR, that last line

print(pd.DataFrame(list(datapkg.data)))

raises

Traceback (most recent call last):
  File "sample/example_cache.py", line 14, in <module>
    print(pd.DataFrame(list(datapkg.data)))
  File "/Users/scls/github/femto/datapackage/datapackage/datapackage.py", line 705, in get_data
    next(reader)
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

@femtotrader
Copy link
Copy Markdown
Contributor Author

Fixed now using StringIO.
cache is working fine !

but some tests are still failling

======================================================================
ERROR: Try reading a datapackage from the web
----------------------------------------------------------------------
Traceback (most recent call last):
  File "//anaconda/lib/python3.4/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "//anaconda/lib/python3.4/unittest/mock.py", line 1136, in patched
    return func(*args, **keywargs)
  File "/Users/scls/github/femto/datapackage/tests/test_datapackage.py", line 221, in test_web_url
    self.dpkg = datapackage.DataPackage('http://data.okfn.org/data/cpi/')
  File "/Users/scls/github/femto/datapackage/datapackage/datapackage.py", line 108, in __init__
    super(DataPackage, self).__init__(**descriptor)
TypeError: o.keys() are not iterable

======================================================================
ERROR: test_resource.TestDatapackage.test_open_resource_local
----------------------------------------------------------------------
Traceback (most recent call last):
  File "//anaconda/lib/python3.4/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/scls/github/femto/datapackage/tests/test_resource.py", line 648, in test_open_resource_local
    list(dpkg.data) # Force the iteration over the iterable returned from data property.
  File "/Users/scls/github/femto/datapackage/datapackage/datapackage.py", line 700, in get_data
    resource_file = compat.StringIO(resource_file.text)
TypeError: initial_value must be str or None, not MagicMock

======================================================================
ERROR: test_resource.TestDatapackage.test_open_resource_url
----------------------------------------------------------------------
Traceback (most recent call last):
  File "//anaconda/lib/python3.4/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "//anaconda/lib/python3.4/unittest/mock.py", line 1136, in patched
    return func(*args, **keywargs)
  File "/Users/scls/github/femto/datapackage/tests/test_resource.py", line 642, in test_open_resource_url
    list(dpkg.data) # Force the iteration over the iterable returned from data property.
  File "/Users/scls/github/femto/datapackage/datapackage/datapackage.py", line 700, in get_data
    resource_file = compat.StringIO(resource_file.text)
TypeError: initial_value must be str or None, not MagicMock

Any help is welcome

@femtotrader
Copy link
Copy Markdown
Contributor Author

Using raw response (instead of StringIO) - File-like object representation of response (for advanced usage). Use of raw requires that stream=True be set on the request.
http://docs.python-requests.org/en/latest/api/#requests.Response.raw

But there is still a problem with

$ nosetests -s -v tests/test_datapackage.py:TestDatapackage.test_web_url

@trickvi
Copy link
Copy Markdown
Owner

trickvi commented Sep 16, 2015

Hi @femtotrader I just wanted to say that you're not pushing things into the void, I just haven't had the time to properly review but will try to make time real soon.

@femtotrader
Copy link
Copy Markdown
Contributor Author

Thanks.
I think having a cache mechanism is very important as it will allow to test whole datapackages without making several requests and so it should helps #63

My own opinion about a roadmap is:

@femtotrader
Copy link
Copy Markdown
Contributor Author

@trickvi @pwalsh

I wonder if we can't add requests http://www.python-requests.org/en/latest/ as a dependency it will be easier to manage tests with
https://pypi.python.org/pypi/requests-mock

@trickvi
Copy link
Copy Markdown
Owner

trickvi commented Sep 17, 2015

@femtotrader yes I've been thinking about this for the last few days, when I started this project out I wanted little to no dependencies but I am now thinking about introducing dependencies so then we might go for requests.

Many aeons ago I thought about using requests but decided to go with the good ol' standard urllib because it allowed me to open both local and remote paths in the same call and return a file handle. Unfortunately requests didn't but there's a way around. So I'm +1 on introducing it to make dev life easier.

@femtotrader
Copy link
Copy Markdown
Contributor Author

Nice decision which should lead to a lot of code "cleanup".

@femtotrader
Copy link
Copy Markdown
Contributor Author

I did a branch with requests only (no urlopen again) but some tests are still failling.

https://github.com/femtotrader/datapackage/tree/cache_requests_only

Any idea ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datapackage should provide a cache mechanism

2 participants