Releases: ruippeixotog/scala-scraper
Releases · ruippeixotog/scala-scraper
v2.0.0
-
Breaking changes
- Extracting using a CSS query string as extractor will now extract elements instead of text. This allows easier
chaining of extractors and CSS selectors and fits more nicely the current extractor model. The old behavior can be
recovered by wrapping the CSS query string in thetexts
content extractor, e.g.doc >> texts("myQuery")
; HtmlExtractor
,HtmlValidator
andElementQuery
now have an additional type parameter for the type ofElement
they work on. If you have custom instances of one of those classes, filling the missing parameter withElement
(which is a superclass of all elements) should be enough for them to work with all source code using
scala-scraper 1.x;- Methods for loading extractors and validators from a config were extracted to a separate module. In order to use
them users must addscala-scraper-config
to their SBT dependencies and import
net.ruippeixotog.scalascraper.config.dsl.DSL._
; - The implicit conversion of
Validated/Either
to aRightProjection
in order to exposeforeach
,map
and
flatMap
in for comprehensions was moved to a separate object that is not imported together with the DSL. Either
upgrade to Scala 2.12 (in whichEither
is already right-biased) or import the new
net.ruippeixotog.scalascraper.util.EitherRightBias
support object;
- Extracting using a CSS query string as extractor will now extract elements instead of text. This allows easier
-
Deprecations
SimpleExtractor
andSimpleValidator
are now deprecated. The classes remain available for the time being, but DSL
methods that returned those classes now return onlyHtmlExtractor
andHtmlValidator
instances;- The
Validated
type alias is now deprecated. Users should now useEither
,Right
andLeft
directly; - The
asDate
content parser was deprecated in favor ofasLocalDate
andasDateTime
; - The DSL validation operator
~/~
was renamed to>/~
in order to have the same precedence as the extraction
operators>>
and>?>
; - The
and
DSL operator is deprecated and will be removed in future versions;
-
New features
- The concrete type of the models in scala-scraper is now passed down from the
Browser
toElement
instances
extracted from documents. This allows users to use features unique of each browser (such as modifying or interacting
with elements) while still using the scala-scraper DSL to exteact and query them; HtmlExtractor[E, A]
is now a proper instance ofElementQuery[E] => A
and havemap
andmapQuery
methods to
map the extraction results and the preceding query, respectively;- Content extractors, which were previously just functions, are now full-fledged
HtmlExtractor
instances and can be
used by themselves, e.g.doc >> elements
,doc >> elementList("myQuery") >> formData
; - A new
PolyHtmlExtractor
class was created, allowing the implementation of extractors whose return type depends on
the type of the element or document being extracted; - Overall code cleanup and simplification of some concepts.
- The concrete type of the models in scala-scraper is now passed down from the
v1.2.1
- Bug fixes
- Fix type parameter usage in three-arg
>?>
DSL operator.
- Fix type parameter usage in three-arg
v1.2.0
- New features
- Support for Scala 2.12;
- New method
closeAll
inHtmlUnitBrowser
, for closing opened windows; - New model
Node
representing a DOM node - in this library, either aElementNode
or aTextNode
; - New methods
childNodes
andsiblingNodes
inElement
.
v1.1.0
- New features
- New methods
clearCookies
,parseInputStream
andparseResource
inBrowser
; - New methods
hasAttr
andsiblings
inElement
; - Support for SOCKS proxies.
- New methods
- Bug fixes
- Correct handling of missing name and value attributes in the
formData
extractor.
- Correct handling of missing name and value attributes in the
v1.0.0
First stable version.