Sybren Stuvel wrote:
Fuzzyman enlightened us with:
My worry is that if '\n' *doesn't* signify a line break on the Mac,
then it may exist in the body of the text - and trigger ``ending =
'\n'`` prematurely ?
I'd count the number of occurences of '\r\n', '\n' without a preceding
'\r' and '\r' without following '\n', and let the majority decide.
This is what I came up with. As you can see from the docstring, it
attempts to sensible(-ish) things in the event of a tie, or no line
endings at all.
Comments/corrections welcomed. I know the tests aren't very useful
(because they make no *assertions* they won't tell you if it breaks),
but you can see what's going on :
rn = re.compile('\r\n')
r = re.compile('\r(?!\n)')
n = re.compile('(?<!\r)\n')
# Sequence of (regex, literal, priority) for each line ending
line_ending = [(n, '\n', 3), (rn, '\r\n', 2), (r, '\r', 1)]
def find_ending(text, default=os.linesep):
Given a piece of text, use a simple heuristic to determine the line
ending in use.
Returns the value assigned to default if no line endings are found.
This defaults to ``os.linesep``, the native line ending for the
If there is a tie between two endings, the priority chain is
``'\n', '\r\n', '\r'``.
results = [(len(exp.findall(text)), priority, literal) for
exp, literal, priority in line_ending]
if not sum([m for m in results]):
if __name__ == '__main__':
tests = [
'\n\r \n\r \n\r',
for entry in tests:
All the best,
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?