Metadata-Version: 2.1
Name: uraltimber
Version: 0.30
Summary: Extract Query and Title information from URLs
Home-page: https://code.compassfoundation.io/dave/uraltimber
Author: Dave Burkholder
Author-email: dave@compassfoundation.io
License: SAP
Keywords: Security Appliance,Log Cabin
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: License :: SAP
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Utilities
Description-Content-Type: text/markdown

Another project with a tacky pun for a name. Good solid timber is extracted from the Ural Mountains,
and interesting data can be extracted from URLs to enhance reports.


## URL Parsing

The typical entry point is the `URL` class. Pass a url string to the URL class and call `.all()` to 
extract all the supported data features. 

```python
from uraltimber import URL

url = URL('https://www.ebay.com/itm/Craftsman-299-piece-Ultimate-Easy-Read-Deep-Standard-SAE-Metric-Socket-Set/302015586131?epid=23019373702')

url.all()

{'default_mimetype': '',
 'domain_name': 'ebay.com',
 'extension': '',
 'hostname': 'www.ebay.com',
 'params': {'epid': ['23019373702'],
  'hash': ['item4651881f53:g:qUEAAOSw-CpX-R1P:rk:2:pf:0']},
 'search_term': '',
 'title': 'Craftsman 299 piece Ultimate Easy Read Deep Standard SAE Metric Socket Set'}

```
The `default_mimetype` key shown above is useful to parse blocked lines where no mimetype is 
available. Often the path extension or other properties of the URL can be used to make a reasonable
guess at what the URL's mimetype probably is. 


## Mimetype Parsing

When parsing logs, mimetypes can come in the typically messy myriad manifestations that necessitate
data cleanup. The `Mimetype` class normalizes malformed mimetypes and standardizes the experimental
(`x-`) and vendor (`vnd-`) mimetypes to shorter forms.

```python
from uraltimber import Mimetype

mime = Mimetype('application/x-javascript')

mime.clean
'application/javascript'

mime.mtype
'application'

mime.subtype
'javascript'

mime.hit_code
20

```

## Extractors

The `URL` class is powered by the `Extractor` class and its subclasses. Extractors must be written 
on a URL-by-URL basis to extract the desired attributes.

All extractors must inherit from the `Extractor` class and define the necessary regexes. Check out 
the numerous examples in the `uraltimber/extractors` module.

All extractors must be tested! Add test fixture data for each new `Extractor` you write, with at
least one fixture per attribute extracted. Check the numerous fixtures in the 
`uraltimber/tests/fixtures` directory.



