pat-dissertation/README.md

41 lines
1.3 KiB
Markdown

# pat-dissertation
Code for Pat's dissertation.
## sorter.rb usage
### Options
sorter.rb takes the following options:
| option | usage |
| --------------- | -------------------------------------------------- |
| -f, --file | the name of the input file |
| -b, --bin-file | the name of the bin csv file |
| -t, --type | what type of splitting to do, can be "iat" or "pn" |
### Output
sorter.rb will generate two files:
* [filename]-out.json
* [filename]-out.csv
Both files contain the same data in json or csv format.
### Type options
The program has two filtering modes:
#### iat
This mode grabs all text from the input file in between `PLOVEOPENING` and `PLOVECLOSING`.
It ignores all text before `PLOVEOPENING` and after `PLOVECLOSING`.
It does not support multiple sections of text.
#### pn
This mode grabs each section of text from the input file in between `Narrative:` and `Signatures:`.
It supports multiple sections from a single input text file.
### Example:
`./sorter.rb --file tester.txt --bin-file bins.csv --type iat`
The above command will run against `tester.txt`, count strings according to `bins.csv`, and process the input text in `iat` mode.
It will create `tester-out.json` and `tester-out.csv` containing the output data.