2020-07-15 17:25:24 +00:00
|
|
|
# pat-dissertation
|
|
|
|
|
2020-10-24 18:23:04 +00:00
|
|
|
Code for Pat's dissertation.
|
|
|
|
|
|
|
|
## sorter.rb usage
|
|
|
|
### Options
|
|
|
|
sorter.rb takes the following options:
|
2020-10-24 18:24:43 +00:00
|
|
|
|
2020-10-24 18:23:04 +00:00
|
|
|
| option | usage |
|
2020-10-24 18:28:30 +00:00
|
|
|
| --------------- | -------------------------------------------------- |
|
2020-10-24 18:23:04 +00:00
|
|
|
| -f, --file | the name of the input file |
|
|
|
|
| -b, --bin-file | the name of the bin csv file |
|
|
|
|
| -t, --type | what type of splitting to do, can be "iat" or "pn" |
|
|
|
|
|
|
|
|
### Output
|
|
|
|
sorter.rb will generate two files:
|
|
|
|
* [filename]-out.json
|
|
|
|
* [filename]-out.csv
|
|
|
|
|
|
|
|
Both files contain the same data in json or csv format.
|
|
|
|
|
|
|
|
### Type options
|
|
|
|
The program has two filtering modes:
|
|
|
|
|
|
|
|
#### iat
|
|
|
|
This mode grabs all text from the input file in between `PLOVEOPENING` and `PLOVECLOSING`.
|
|
|
|
It ignores all text before `PLOVEOPENING` and after `PLOVECLOSING`.
|
|
|
|
It does not support multiple sections of text.
|
|
|
|
|
|
|
|
#### pn
|
|
|
|
This mode grabs each section of text from the input file in between `Narrative:` and `Signatures:`.
|
|
|
|
It supports multiple sections from a single input text file.
|
|
|
|
|
|
|
|
### Example:
|
|
|
|
`./sorter.rb --file tester.txt --bin-file bins.csv --type iat`
|
|
|
|
|
|
|
|
The above command will run against `tester.txt`, count strings according to `bins.csv`, and process the input text in `iat` mode.
|
|
|
|
It will create `tester-out.json` and `tester-out.csv` containing the output data.
|
|
|
|
|
|
|
|
|