Assembly: BioSharp.Core (in BioSharp.Core.dll) Version: 0.1.3191.26120 (0.1.0.0)
Syntax
C# |
---|
public static string CreateRegex( ISymbolList motif ) |
Parameters
- motif
- Type: BioSharp.Core.Bio.Symbol..::.ISymbolList
an ISymbolList.
Return Value
a String regular expression.Remarks
Ambiguous Symbols are simply transformed into character classes. For example the nucleotide sequence "AAGCTT" becomes "A{2}GCT{2}" and "CTNNG" is expanded to "CT[ABCDGHKMNRSTVWY]{2}G". The character class is generated using the GetMatches method of an ambiguity symbol to obtain the alphabet of AtomicSymbols it matches, followed by calling GetAllSymbols on this alphabet, removal of any gap symbols and then tokenization of the remainder. The ordering of the tokens in a character class is by ascending numerical order of their tokens as determined by sorting the array.
The IAlphabet of the ISymbolList must be finite and must have a character token type. Regular expressions may be generated for any such ISymbolList, not just DNA, RNA and protein.